Creating software for environmental measurements based on GPS data - twsagps package report number 2

Author

Igor Graczykowski

1 Introduction

This report comprehends the present state of twsagps package - current issues and further development plans. Package consists of five functions: exposure_PO(), exposure_KDE(), exposure_DR(), exposure_LS() and rast_stats(). The first four functions each utilize one of Time Weighted Spatial Averaging (TWSA) method. Fifth function calculates various statistics for multiple rasters. The TWSA methods are taken from the reference paper (Jankowska et al. 2023) on which the package is based upon - Point Overlay (PO), Kernel Density Estimation (KDE), Density Ranking (DR) with the addition of Line Segment (LS). Report demonstrates code workflow and outcomes.

For this demo following packages were used:

library(terra)
library(exactextractr)
library(sf)
library(tmap)
library(tmaptools)
library(dplyr)
library(twsagps)

2 Research area and sample data

For presenting package functioning Geolife GPS Trajectories 1.3 Data (Zheng et al. 2009), (Zheng et al. 2008), (Yu Zheng 2010) was used as sample GPS data. The Geolife dataset consists of significant volumes of time-stamped GPS data retrieved from 182 pariticipants in over five year time span. Research area was limited to San Diego county and GPS data was restricted to the research area. To improve running time of functions Geolife data was reduced to each 10th record. The GPS data was collected between 17th August 2011 and 20th August 2011.

geol_file = read.csv("geolife_san_diego.csv")
geolife_days = geol_file |> select(-X) |> 
  slice(which(row_number() %% 10 == 1))
geolife_time = geolife_days |>
  mutate(dateTime = paste(date, time), 
         dateTime = as.POSIXct(dateTime, format = "%Y-%m-%d %H:%M:%S", tz = "UTC"))
lat lon date time dateTime
33.26486 -117.4323 2011-08-17 21:38:11 2011-08-17 21:38:11
33.26239 -117.4301 2011-08-17 21:38:21 2011-08-17 21:38:21
33.25969 -117.4280 2011-08-17 21:38:31 2011-08-17 21:38:31
33.25700 -117.4259 2011-08-17 21:38:41 2011-08-17 21:38:41
33.25437 -117.4238 2011-08-17 21:38:51 2011-08-17 21:38:51
33.25162 -117.4216 2011-08-17 21:39:01 2011-08-17 21:39:01
Table 1: Geolife records with dateTime

NDVI raster data was applied as sample environmental layer. Indicator was calculated from a Landsat 8 image involving San Diego county which was captured on 22nd Februrary 2024.

landsat_red = rast("landsat_ndvi/LC09_L2SP_040037_20240222_20240224_02_T1_SR_B4.TIF")
landsat_nir = rast("landsat_ndvi/LC09_L2SP_040037_20240222_20240224_02_T1_SR_B5.TIF")
landsat_ndvi = (landsat_nir - landsat_red) / (landsat_nir + landsat_red)
Figure 1: NDVI data (credits - U.S. Geological Survey) with Geolife San Diego data

Vector environmental layer of bike routes lines was utilized as consecutive sample data. Cycling paths span over San Diego county and were sourced from City of San Diego Open Data portal updated at 2024-04-30. Bicycle trail data is reclassified to numeric type based on class attribute indicating category of bike lane.

  • Class I - Paved right-of-way for exclusive use by bicyclists, pedestrians and those using non-motorized modes of travel. They are physically separated from vehicular traffic and can be constructed in roadway right-of-way or exclusive right-of-way (Reclassified as 3)

  • Class II - One-way facilities on either side of a roadway with pavement striping and signage used to allocate a portion of a roadway for exclusive or preferential bicycle travel (Reclassified as 2)

  • Class III - Shared use with motor vehicle traffic within the same travel lane. Designated by signs (Reclassified as 1)

  • Class IV (One-Way)/Class IV (Two-Way) - Exclusive bike facility physically separated from motor traffic and distinct from the sidewalk (Reclassified as 4)

sd_bike = vect("san_diego_bike/bike_routes_datasd.shp")
sd_bike$class = replace(sd_bike$class, sd_bike$class == "Class IV (Two-Way)", "Class IV (One-Way)")
sd_bike$class = factor(sd_bike$class, levels = c("Class III", "Class II", "Class I", "Class IV (One-Way)")) |> as.numeric()
Figure 2: San Diego bike routes lines data with reclassified “class” attribute (credits - City of San Diego)

All sample data is avaible at repository.

3 Methods

3.1 Point Overlay (PO)

Point Overlay method is implemented in exposure_PO() function. Evaluated by converting GPS point SpatVector data to SpatRaster data type using sum function.

3.2 Kernel Density Estimation (KDE)

The exposure_KDE() utilizes the KDE method. Method generates smoothed buffers aroud GPS data considering differences in spatial distribution and influence. Function calculates exposure from GPS points coordinates on matrix of coordinates using spat_kde() created for this package. The spat_kde() function, used in computing KDE, utilizes quartic kernel (Silverman 1986):

\hat{f}(x) = \frac{1}{h^2} \sum^n_{i=1} K \{\frac{1}{h}(x - X_i)\}

and

K_2(x) = \left\{ \begin{array}{rcl} 3\pi^{-1}(1-x^Tx)^2 & if & x^Tx > 1 \\ 0 & \ & otherwise \end{array}\right.

where h is kernel search radius (bandwidth), x current cell coordinates and X_i current data point. KDE formula does not incorporate further division by n number of cells for the purpose of calculating KDE in intensity unit (https://gis.stackexchange.com/a/389744).

3.3 Density Ranking (DR)

Density Ranking method in exposure_DR() is based on DR() function from yenchic/density_ranking GitHub repository [Chen and Dobra (2020)](Chen 2019) by implementing Empirical Cumulative Distrubution Function in spat_dr() function created fot this package. DR serves as a improvement to KDE method by ranking KDE values (Jankowska et al. 2023). Function implements DR formula (Chen and Dobra 2020):

\hat{\alpha}(x) = \sum^n_{i=1}I(\hat{p}(X_i)\leq\hat{p}(x))

The \hat{\alpha}(x) takes value between 0 and 1. Lack of n number of cells division in formula mentioned in Section 3.2.

3.4 Line Segment (LS)

New addition in TWSA approaches is a Line Segment method applied in exposure_LS() drawn from michaeldgarber/microclim-static-v-dynam GitHub repository. Code applied in exposure_LS() consists of modified functions from the repository’s demo. Line Segment puts emphasis on time component of GPS data. Firstly, trajectories following GPS movement. Than buffers around each trajectory segment are created and combined with time elapsed between every two GPS records. Secondly, weighted values for each segment are extracted from environmental layer. Obtained polygons are converted to raster with sum function afterwards.

Demo applies terra::extract() in deriving environmental data values which is substituted in exposure_LS() with exactextractr::exact_extract() function due to significant computation speed difference.

4 Processing

4.1 Parameters

TWSA functions of twsagps package utilizes various arguments:

  • data - data.frame, SpatVector points or sf data.frame containing only POINTS of GPS data

  • x - x or longitude coordinates column name if data is a data.frame

  • y - y or latitude coordinates column name if data is a data.frame

  • NA_val - value in x and y marked as NA if data is a data.frame

  • time_data (LS) - column of data containing POSIXct class objects

  • time_unit (LS) - unit of time weights in time_data

  • cellsize - size of raster cells in meters if output has a longtitude/latitude CRS or in the units of output CRS

  • group_split - column of data based on which it is grouped and split. For n groups output will be n SpatRasters

  • bandwidth (KDE/DR/LS) - bandwidth in units of output CRS (DR/KDE) or size of buffer in meters if output has a longtitude/latitude CRS or in the units of output CRS (LS)

  • env_data - stars, SpatRaster, SpatVector or sf object class of environmental data. Activity space is calculated when not set

  • env_field - column name of SpatVector/sf env_data that env_data will be rasterized on

  • env_buff - optional buffer around SpatVector/sf env_data in meters if output has a longtitude/latitude CRS or in the units of the CRS

  • normalize - if TRUE activity space SpatRaster is normalized

  • norm_method - normalization method - “center”, “scale”, “standardize” or “range”. Default is “range”

  • norm_group - if FALSE each activity space SpatRaster is normalized seperately. If TRUE activity space SpatRasters are normalized to SpatRaster with highest max value. Not ignored when normalize is TRUE, norm_method is “range” and group_split is set

  • time_only_norm (LS) - TEMPORARY - type of time normalization. If TRUE only time_data is normalized. If FALSE and norm_method is “range” normalization is conducted with regard to spatial distribution

  • grid_extent - if stars or SpatRaster grid_extent is output’s grid. If SpatExtent, sf bbox object or numeric vector c(xmin, xmax, ymin, ymax) grid_extent is output’s extent

  • input_crs- CRS of data’s coordinates if data is a data.frame

  • output_crs- CRS of output

  • filepath - output filename

data = geolife_time
x = "lon"
y = "lat"
NA_val = NULL
time_data = "dateTime"
time_unit = "mins"
cellsize = 50
group_split = "date"
bandwidth = 200

# ndvi env_data
env_data = landsat_ndvi
env_field = NULL
env_buff = NULL

# bike env_data
env_data = sd_bike
env_field = "class"
env_buff = 50

normalize = TRUE
norm_method = "range"
norm_group = TRUE
time_only_norm = FALSE
grid_extent = NULL
input_crs = "EPSG:4326"
output_crs = crs(landsat_ndvi)
filepath = NULL

4.2 Entry error handling

With the aim of enhancing versatility of TWSA functions multiple types of objects are managed as arguments. Unhandled and handled classes are filtered and processed to uniform type returning proper messages, warnings and errors.

Input of environmental data must be stars, SpatRaster, SpatVector or sf object class. Stars is converted into SpatRaster and sf into SpatVector.

# handle stars objects rasters as env_data
if (!is.null(env_data)) {
  if (any(class(env_data) == "stars")){
    env_data = terra::rast(env_data)
  } else if (any(class(env_data) == "sf")) {
    env_data = terra::vect(env_data)
  } else if (any(!class(env_data) %in% c("SpatVector", "SpatRaster"))){
    stop("Invalid env_data - env_data neither stars, sf, SpatVector nor SpatRaster class")
  }
}

Grid extent argument handles stars, SpatRaster, SpatExtent, sf bbox class objects and numeric vector of 4 values indicating boundaries of extent (xmin, xmax, ymin, ymax). For grid extent stars class the object is transformed to SpatRaster and sf bbox or vector to SpatExtent.

# handle grid_extent
if (!is.null(grid_extent)){
  if (any(class(grid_extent) == "stars")){
    grid_extent = terra::rast(grid_extent)
  } else if (any(class(grid_extent) == "bbox" || (is.vector(grid_extent) &&
                                                  all(class(grid_extent) == "numeric") && length(grid_extent) == 4))){
    grid_extent = terra::ext(grid_extent)
  } else if (any(!class(grid_extent) %in% c("SpatExtent", "SpatRaster"))){
    stop("Invalid grid_extent - grid_extent neither stars, SpatRaster, SpatExtent, bbox class nor numeric vector of 4 length")
  }
}

Normalize and groups normalization arguments are boolean. They undergo tests to ascertain interpretability as logical, subsequently converted to logical.

# handle normalize and norm_group
if (is.na(as.logical(normalize))){
  stop("Invalid normalize argument - normalize argument cannot be interpreted as boolean")
} else {
  normalize = as.logical(normalize)
}

if (is.na(as.logical(norm_group))){
  stop("Invalid norm_group argument - norm_group argument cannot be interpreted as boolean")
} else {
  norm_group = as.logical(norm_group)
}

Normalization method is required to be one of normalization methods from BBmisc::normalize(). When normalization is conducted and method is not center, scale, standardize or range, argument is assigned to default method - range.

# handle norm_method
if (!norm_method %in% c("center", "scale", "standardize", "range") && normalize == TRUE) {
  warning('Invalid norm_method - applying default normalization method "range"')
  norm_method = "range"
}

4.2.1 Kernel Density Estimation/Density Ranking/Line Segment

In exposure_KDE(), exposure_DR() and exposure_LS() obligatory bandwidth argument is implemented. Bandwidth is constrained to be single positive numeric value.

#handle bandwidth
if (is.null(bandwidth)){
  stop("Missing bandwidth argument. Provide valid bandwidth")
} else if (length(bandwidth) != 1 || !is.numeric(bandwidth) || bandwidth <= 0){
  stop("Invalid bandwidth argument - bandwidth is neither positive nor single numeric value")
}

4.2.2 Line Segment

Line Segment temporary time only norm argument (see Section 6.2) is expected to be logical and is handled analogously to normalize and groups normalization arguments.

# handle time_only_norm
if (is.na(as.logical(time_only_norm))){
  stop("Invalid time_only_norm argument - time_only_norm argument cannot be interpreted as boolean")
} else {
  time_only_norm = as.logical(time_only_norm)
}

4.3 Entry processing

Considering numerous GPS data types, grid extent, environmental layer and CRS arguments GPS data is required to be preprocessed prior to further computation. Initially data frame GPS dataset class is handled. Obligatory for developing spatial data x and y coordinates columns are validated. Afterwards star_processing() function is called.

# get spatial data with correct crs
if (!is.null(data)){
  if (all(class(data) == "data.frame")){
    if (all(!c(is.null(x), is.null(y)))){
      x_enq = rlang::enquo(x)
      y_enq = rlang::enquo(y)

      if (all(c(rlang::quo_name(y_enq), rlang::quo_name(y_enq)) %in% colnames(data))){

        data_proj = start_processing(data = data, x = rlang::quo_name(x_enq),
                                     y = rlang::quo_name(y_enq), NA_val = NA_val,
                                     env_data = env_data, grid_extent = grid_extent,
                                     input_crs = input_crs, output_crs = output_crs)
      } else {
        stop("Invalid x or y arguments - x or y are not a column in data")
      }
    } else {
      stop('Missing x or y arguments for data "data.frame" class')
    }
  } else {
    data_proj = start_processing(data = data, env_data = env_data,
                                 grid_extent = grid_extent, input_crs = input_crs,
                                 output_crs = output_crs)
  }
} else {
  stop("Missing data argument - provide valid data argument")
}

4.3.1 Start_processing function

The start_processing() evaluates target SpatVector points GPS data class incorporating exclusion of NA values and CRS transformations with properly signalled conditions.

Listing 1: start_processing function
Code
start_processing = function(data, x, y, NA_val, env_data, grid_extent,
                            input_crs, output_crs){

  # get spatial data
  if (all(class(data) == "data.frame")) {
    if (is.null(input_crs)){ # implement so missing works
      input_crs = ""
    }
    if (!is.null(NA_val)) {
      if (is.numeric(NA_val)){
        n_row = nrow(data)
        data = data[data[x] != NA_val & data[y] != NA_val,]
        message(paste0("Removing rows containing default NA value ",
                       NA_val, " in x and y columns - removing ", n_row - nrow(data), " rows"))
      } else {
        warning("Invalid NA_val argument - NA_val is not numeric. Ignoring NA_val argument")
      }

    }
    if (any(is.na(data[,c(x,y)]))){
      n_row_NA = nrow(data)
      data = tidyr::drop_na(data, x, y)
      message(paste0("Removing rows containing NA in x and y columns - removing", n_row_NA - nrow(data), " rows"))

    }
    data_points = terra::vect(x = data, geom = c(x, y), crs = input_crs)

  } else if (any(class(data) == "sf") && all(sf::st_geometry_type(data) == "POINT")){
    data_points = terra::vect(data)

    if (!is.null(input_crs)){
      message("Ignoring input_crs argument")
    }

} else if (any(class(data) == "SpatVector") && terra::geomtype(data) == "points") {
    data_points = data
    if (!is.null(input_crs)){
      message("Ignoring input_crs argument")
    }
  } else {
    stop("Object data is neither data.frame, sf nor SpatVector class")
  }

  data_crs = terra::crs(data_points)

  # crs
  if (!is.null(output_crs)){
    if (data_crs == "") {
      message("Setting data crs to output_crs")
      terra::crs(data_points) = output_crs
      data_proj = data_points
    } else {
      message("Projecting data to output_crs")
      data_proj = terra::project(data_points, output_crs)
    }
  } else if (!is.null(grid_extent) && class(grid_extent) == "SpatRaster") {
    if (data_crs == "") {
      message("Setting data crs to grid_extent crs")
      terra::crs(data_points) = terra::crs(grid_extent)
      data_proj = data_points
    } else {
      message("Projecting data to grid_extent crs")
      data_proj = terra::project(data_points, grid_extent)
    }
  } else if (data_crs != "") { # any invalid/empty crs
    data_proj = data_points
  } else if (!is.null(env_data)) {
    message("Setting data crs to environmental data crs")
    terra::crs(data_points) = terra::crs(env_data)
    data_proj = data_points
  } else {
    message("No crs specified")
    data_proj = data_points
  }
  # crop to grid_extent
  if (!is.null(grid_extent)){
    if (class(grid_extent) == "SpatExtent"){
      data_proj = terra::crop(data_proj, grid_extent)
    } else {
      if (terra::crs(data_proj) != terra::crs(grid_extent)){
        grid_rast = terra::project(grid_rast, terra::crs(data_proj))
      }
      data_proj = terra::crop(data_proj, terra::ext(grid_extent))
    }
  }
  
  return(data_proj)
}

At the outset function handles various GPS dataset classes. For data frame class object input CRS is tested. Thereafter, rows containing value from NA_val argument or NA value in x and y columns are omitted. Utilizing processed coordinates columns SpatVector object with input CRS is generated. In cases of sf data frame containing only POINTS GPS data is converted to SpatVector. Points SpatVector and sf class ignore input CRS.

# get spatial data
if (all(class(data) == "data.frame")) {
  if (is.null(input_crs)){
    input_crs = "" # create vector with empty crs
  }
  if (!is.null(NA_val)) {
    if (is.numeric(NA_val)){
      n_row = nrow(data)
      data = data[data[x] != NA_val & data[y] != NA_val,]
      message(paste0("Removing rows containing default NA value ",
                     NA_val, " in x and y columns - removing ", n_row - nrow(data), " rows"))
    } else {
      warning("Invalid NA_val argument - NA_val is not numeric. Ignoring NA_val argument")
    }

  }
  if (any(is.na(data[,c(x,y)]))){
    n_row_NA = nrow(data)
    data = tidyr::drop_na(data, x, y)
    message(paste0("Removing rows containing NA in x and y columns - removing", n_row_NA - nrow(data), " rows"))
  }
  data_points = terra::vect(x = data, geom = c(x, y), crs = input_crs)
} else if (any(class(data) == "sf") && all(sf::st_geometry_type(data) == "POINT")){
  data_points = terra::vect(data)

  if (!is.null(input_crs)){
    message("Ignoring input_crs argument")
  }

} else if (any(class(data) == "SpatVector") && terra::geomtype(data) == "points") {
  data_points = data
  if (!is.null(input_crs)){
    message("Ignoring input_crs argument")
  }
} else {
  stop("Object data is neither data.frame, sf nor SpatVector class")
}

Subsequently, start_process() function manages setting and transformation of GPS CRS. Operation has regard to following hierarchy - output CRS, SpatRaster grid extent, not empty currect SpatVector CRS and environmental dataset. Whilst none of conditions mentioned beforehand are fulfilled, the SpatVector’s CRS remains empty.

data_crs = terra::crs(data_points)

# crs
if (!is.null(output_crs)){
  if (data_crs == "") {
    message("Setting data crs to output_crs")
    terra::crs(data_points) = output_crs
    data_proj = data_points
  } else {
    message("Projecting data to output_crs")
    data_proj = terra::project(data_points, output_crs)
  }
} else if (!is.null(grid_extent) && class(grid_extent) == "SpatRaster") {
  if (data_crs == "") {
    message("Setting data crs to grid_extent crs")
    terra::crs(data_points) = terra::crs(grid_extent)
    data_proj = data_points
  } else {
    message("Projecting data to grid_extent crs")
    data_proj = terra::project(data_points, grid_extent)
  }
} else if (data_crs != "") { # any invalid/empty crs
  data_proj = data_points
} else if (!is.null(env_data)) {
  message("Setting data crs to environmental data crs")
  terra::crs(data_points) = terra::crs(env_data)
  data_proj = data_points
} else {
  message("No crs specified")
  data_proj = data_points
}
Projecting data to output_crs

Finally, SpatVector is cropped to grid extent and returns processed GPS points SpatVector.

# crop to grid_extent
if (!is.null(grid_extent)){
  if (class(grid_extent) == "SpatExtent"){
    data_proj = terra::crop(data_proj, grid_extent)
  } else {
    if (terra::crs(data_proj) != terra::crs(grid_extent)){
      grid_rast = terra::project(grid_rast, terra::crs(data_proj))
    }
    data_proj = terra::crop(data_proj, terra::ext(grid_extent))
  }
}

4.3.2 Line Segment

After processing CRS and SpatVector data in start_processing() Line Segment performs additional validation for time data and omits records with NA value.

# if time_data remove points with NA time_data
if (!is.na(time_data)){
  enq_time = rlang::enquo(time_data)
  time_cname = rlang::quo_name(enq_time)
  if (!time_cname %in% terra::names(data_proj)){
    stop("Invalid time_data argument - time_data is not a column name in data")
  } else if (any(is.na(data_proj[time_cname]))){
    n_row = nrow(data_proj)
    data_proj = tidyterra::drop_na(data_proj,  time_cname)
    message(paste0("Removing points with NA ", time_cname, " values - removing ", n_row - nrow(data_proj), " rows"))
  }
}

4.4 Grid Raster

Upon managing CRS and numerous class objects computing of grid raster is conducted, later utilized in output raster calculation. The grid_calc() function incorporates cellsize, bandwidth, grid extent and environmental raster layer arguments in evaluation of grid raster. In scenarios where the grid extent is a SpatRaster, it is regarded as grid raster, with the calculation of the latter omitted.

if (!is.null(grid_extent) && any(class(grid_extent) == "SpatRaster")){
  if (terra::crs(data_proj) != terra::crs(grid_extent)){
    grid_extent = terra::project(grid_extent, terra::crs(data_proj))
  }
  grid_rast = grid_extent
} else {
  grid_rast = calc_grid(x = data_proj, cellsize = cellsize, env_data = env_data,
                        grid_extent = grid_extent, bandwidth = bandwidth, is_LS = TRUE)
}

4.4.1 Calc_grid function

Listing 2: grid_calc function
Code
calc_grid = function(x, bandwidth, cellsize, env_data, grid_extent, is_LS = FALSE){

  if (!is.null(grid_extent)){ # grid_extent as ext of grid rast

    extent = grid_extent
  } else { # calculate extent
    extent = terra::ext(x)
    if (!is.null(bandwidth)) { # for KDE/DR expand extent by bandwidth
      if(is_LS == FALSE){
        extent = c(terra::xmin(extent) - bandwidth, terra::xmax(extent) + bandwidth,
                   terra::ymin(extent) - bandwidth, terra::ymax(extent) + bandwidth)
      } else { # for LS in lon/lat crs bandwidth is still in meters (terra::buffer)
        temp_buff = terra::buffer(x, bandwidth)
        extent = terra::ext(temp_buff)
      }
    }
  }

  if(!is.null(cellsize) && length(cellsize) == 1 && is.numeric(cellsize) && cellsize > 0) { # cellsize included

    if (suppressWarnings(!is.na(terra::is.lonlat(x)) && terra::is.lonlat(x))){
      # crs units in degrees
      # is.na if empty crs to skip error
      warning("Cellsize is not stable - cells are not rectangular")

      if (!is.null(bandwidth) && bandwidth > 0.1 && is_LS == FALSE) {
        message(paste0("CRS is in lontitude/latitude and bandwidth is ", bandwidth,
      " - bandwidth is calculated in CRS units. Is bandwidth in correct unit?"))
      }

      dist_lon = geosphere::distm(c(extent[1], extent[3]), c(extent[2], extent[3]),
                                  fun = geosphere::distHaversine)
      dist_lat = geosphere::distm(c(extent[1], extent[3]), c(extent[1], extent[4]),
                                  fun = geosphere::distHaversine)

      # number of cells
      x_cells = (dist_lon / cellsize) |> as.integer()
      y_cells = (dist_lat / cellsize) |> as.integer()

      # empty_rast for units in degrees
      grid_rast = terra::rast(crs = terra::crs(x), nrows=y_cells,
                              ncols=x_cells, extent = extent)

    } else {
      grid_rast = terra::rast(crs = terra::crs(x), extent = extent,
                              resolution = cellsize)
    }

  } else if  (!is.null(env_data) && any(class(env_data) == "SpatRaster")){ #if incorrect cellsize and env_data exists
    warning("Cellsize invalid or not set - using cellsize from env_data")

    if (terra::linearUnits(env_data) == terra::linearUnits(x)){
      # project env data with the same cellsize
      env_data_proj = terra::project(env_data, terra::crs(x), res = terra::res(env_data)[1])
      # if env_data in lat/lon only quadratic cells

    } else {
      env_data_proj = terra::project(env_data, terra::crs(x))
    }
    if (is.null(grid_extent)){
      # extent modified to preserve cellsize
      grid_rast = terra::crop(env_data_proj, extent)
    } else {
      # if grid_extent specified and missing cellsize grid_extent is prior to
      # cellsize and cellsize is slightly modified
      grid_rast = env_data_proj
      terra::ext(grid_rast) = extent
    }
  } else {
    stop("Invalid or is.null cellsize - provide valid cellsize or env_data")
  }
  return(grid_rast)
}

At first grid_calc() function evaluates extent of grid rast. For grid extent as SpatExtent, sf bbox object or vector it acts as extent for grid rast. Otherwise, extent is derived from GPS SpatVector data. DR, KDE and LS methods extents are additionally extended for the purpose of preventing cropping cells with values. Kernel Density Estimation and Density Ranking extensions apply bandwidth value. Line Segment, utilizing terra::buffer(), buffers in longitude/latitude CRS are calculated in meters. Therefore, extension by bandwidth in longitude/latitude CRS unit is not viable. Subsequently, the extent is based on boundary box of buffered GPS SpatVector layer.

if (!is.null(grid_extent)){ # grid_extent as ext of grid rast

  extent = grid_extent
} else { # calculate extent
  extent = terra::ext(data_proj)
  if (!is.null(bandwidth)) { # for KDE/DR expand extent by bandwidth
    if(is_LS == FALSE){
      extent = c(terra::xmin(extent) - bandwidth, terra::xmax(extent) + bandwidth,
                 terra::ymin(extent) - bandwidth, terra::ymax(extent) + bandwidth)
    } else { # for LS in lon/lat crs bandwidth is still in meters (terra::buffer)
      temp_buff = terra::buffer(data_proj, bandwidth)
      extent = terra::ext(temp_buff)
    }
  }
}

Afterwards, computation of grid raster using cellsize, extent and SpatVector CRS is performed. Number of cells for longitude/latitude CRS is calculated individually implementing spherical distance to evaluate non-rectangular cells. Additionally, KDE and DR execute further bandwidth size and unit verification with the aim of feasible prevention of excessive memory usage in method specific calculations (Section 4.8). For missing or incorrect cellsize argument environmental SpatRaster is applied. In scenarios where grid extent is set, extent is preserved at the expense of environmental cellsize. Otherwise, extent is modified to maintain cellsize.

if(!is.null(cellsize) && length(cellsize) == 1 && is.numeric(cellsize) && cellsize > 0) { # cellsize included

  if (suppressWarnings(!is.na(terra::is.lonlat(data_proj)) && terra::is.lonlat(data_proj))){
    # crs units in degrees
    # is.na if empty crs to skip error
    warning("Cellsize is not stable - cells are not rectangular")

    if (!is.null(bandwidth) && bandwidth > 0.1 && is_LS == FALSE) {
      message(paste0("CRS is in lontitude/latitude and bandwidth is ", bandwidth,
    " - bandwidth is calculated in CRS units. Is bandwidth in correct unit?"))
    }

    dist_lon = geosphere::distm(c(extent[1], extent[3]), c(extent[2], extent[3]),
                                fun = geosphere::distHaversine)
    dist_lat = geosphere::distm(c(extent[1], extent[3]), c(extent[1], extent[4]),
                                fun = geosphere::distHaversine)

    # number of cells
    x_cells = (dist_lon / cellsize) |> as.integer()
    y_cells = (dist_lat / cellsize) |> as.integer()

    # empty_rast for units in degrees
    grid_rast = terra::rast(crs = terra::crs(data_proj), nrows=y_cells,
                            ncols=x_cells, extent = extent)

  } else {
    grid_rast = terra::rast(crs = terra::crs(data_proj), extent = extent,
                            resolution = cellsize)
  }

} else if  (!is.null(env_data) && any(class(env_data) == "SpatRaster")){ #if incorrect cellsize and env_data exists
  warning("Cellsize invalid or not set - using cellsize from env_data")

  if (terra::linearUnits(env_data) == terra::linearUnits(data_proj)){
    # project env data with the same cellsize
    env_data_proj = terra::project(env_data, terra::crs(data_proj), res = terra::res(env_data)[1])
    # if env_data in lat/lon only quadratic cells

  } else {
    env_data_proj = terra::project(env_data, terra::crs(data_proj))
  }
  if (is.null(grid_extent)){
    # extent modified to preserve cellsize
    grid_rast = terra::crop(env_data_proj, extent)
  } else {
    # if grid_extent specified and missing cellsize grid_extent is prior to
    # cellsize and cellsize is slightly modified
    grid_rast = env_data_proj
    terra::ext(grid_rast) = extent
  }
} else {
  stop("Invalid or missing cellsize - provide valid cellsize or env_data")
}

4.5 Convert SpatVector environmental data

With grid raster computed environmental SpatVector layer is rasterized in env_vect() function utilizing environmental buffer and environmental field arguments. Conversion to SpatRaster is performed on bike routes lines San Diego dataset.

# if env_data is vector data - create optional buffer and rasterize to grid raster
if (!is.null(env_data) && any(class(env_data) == "SpatVector")) {

  env_data = env_vect(env = env_data, env_buff = env_buff,
                      env_field = env_field, grid = grid_rast)
}

4.5.1 Env_vect function

Environmental buffer argument acts as tool to adapt SpatVector environmental datasets (points and lines in particular) to exposure calculation. Single positive numeric value functions as distance of buffer around SpatVector geometries. Environmental field acts as a field for rasterization utilizing sum function.

Listing 3: env_vect function
Code
env_vect = function(env, env_buff, env_field, grid){

  if (!is.null(env_buff)) { # create buffer around vector data
    if (length(env_buff) == 1 && is.numeric(env_buff) && env_buff > 0){
      env_proj = terra::project(env, terra::crs(grid))
      ## TERRA::BUFFER SOMETIMES CRASHES WITH BIG LINE VECTOR DATASETS
      #env = terra::buffer(env_proj, env_buff)
      ## TEMPORARY FIX: USE SF::ST_BUFFER
      env = env_proj |> sf::st_as_sf() |> sf::st_buffer(env_buff) |> terra::vect()
    } else {
      warning("Invalid env_buff argument - ignoring creation of buffer")
    }
  }

  if (!is.null(env_field)){ # choose field to rasterize
    env_f_enq = rlang::enquo(env_field)
    if (rlang::quo_name(env_f_enq) %in% terra::names(env)){
      env = terra::rasterize(env, grid, field = rlang::quo_name(env_f_enq), fun = "sum")
    } else {
      warning("Invalid env_field argument - env_field is not a column name in data. Ignoring env_field argument")
      env = terra::rasterize(env, grid, fun = "sum")
    }
  } else {
    env = terra::rasterize(env, grid, fun = "sum")
  }
  return(env)
}

Buffer around SpatVector bike routes lines San Diego data is calculated.

if (any(class(env_data) != "SpatRaster")){
  if (!is.null(env_buff)) { # create buffer around vector data
    if (length(env_buff) == 1 && is.numeric(env_buff) && env_buff > 0){
      env_data_proj = terra::project(env_data, terra::crs(grid_rast))
      ## TERRA::BUFFER SOMETIMES CRASHES WITH BIG LINE VECTOR DATASETS
      #env_data = terra::buffer(env_data_proj, env_buff)
      ## TEMPORARY FIX: USE SF::ST_BUFFER
      env_data = env_data_proj |> sf::st_as_sf() |> sf::st_buffer(env_buff) |> terra::vect()
      
    } else {
      warning("Invalid env_buff argument - ignoring creation of buffer")
    }
  }
}
Figure 3: Calculated buffer around San Diego bike routes data

Afterwards, buffer SpatVector is rasterized using class environmental field.

if (!is.null(env_field)){ # choose field to rasterize
  env_f_enq = rlang::enquo(env_field)
  if (rlang::quo_name(env_f_enq) %in% terra::names(env_data)){
    env_data = terra::rasterize(env_data, grid_rast, field = rlang::quo_name(env_f_enq), fun = "sum")
  } else {
    warning("Invalid env_field argument - env_field is not a column name in data. Ignoring env_field argument")
    env_data = terra::rasterize(env_data, grid_rast, fun = "sum")
  }
} else {
  env_data = terra::rasterize(env_data, grid_rast, fun = "sum")
}
Figure 4: Rasterized San Diego bike routes lines buffer data

4.5.2 Line Segment

Line Segment requires assigning environmental data raster values to grid raster for further environmental exposure layer computation Section 4.9.2.

if (!is.null(env_data)){ # env_data included
  # project env_data to grid_rast cellsize
  env_data_proj = terra::project(env_data, grid_rast)

  # replace vals of grid with env values
  terra::values(grid_rast) = terra::values(env_data_proj)
}

4.6 Group split

Group split argument enables splitting SpatVector GPS dataset into seperate SpatVectors by any field (day, movement type, participant ID etc.). Further output SpatRaster is calculated for each SpatVector data individually.

if (is.null(group_split)) {
  data_iter = list(data_proj) # only one item for for loop
} else {
  enq_group_split = rlang::enquo(group_split)
  if (rlang::quo_name(enq_group_split) %in% terra::names(data_proj)){
    data_iter = terra::split(data_proj, rlang::quo_name(enq_group_split)) # split data_proj by group_split
    message(paste0("Data split by group into ", length(data_iter), " items"))
  } else {
    data_iter = list(data_proj)
    warning("Invalid group_split argument - group_split is not a column in data. Data not split")
  }
}
Data split by group into 8 items

4.7 Activity Space Loop

Following splitting the data, each SpatVector is iterated in first loop. Activity space SpatRasters (PO/KDE/DR) or buffer time data SpatVectors (LS) are computed.

4.7.1 Point Overlay/Kernel Density Estimation/Density Ranking

In the beginning two empty lists destined to store loops results are generated. For extent and list discussion see Section 6.1. Activity space for each TWSA method is evaluated: PO - terra::rasterize(), KDE - spat_kde() and DR - spat_dr() (See Section 4.8). Every activity space SpatRaster is added as consecutive item in act_out list.

##DR - change spat_kde
# dr_rast =  spat_dr(data_i, grid_rast, bandwidth)
##PO - change spat_kde
# rast_points = terra::rasterize(data_i, grid_rast,  fun = "length")

act_out = list()
output = list()

for (data_i in data_iter){
  # if each group should have seperate extent then output is a list rasts
  # if all groups should have same extent then output is rast with n layers

  ### UNCOMMENT IF EVERY RAST SHOULD HAVE SEPERATE EXTENT
  if (is.null(grid_extent)) {
    #new ext for each group
    # get extent
    group_extent = terra::ext(data_i)
    # new extent - expanded extent by bandwidth
    new_group_extent = c(terra::xmin(group_extent) - bandwidth,
                         terra::xmax(group_extent) + bandwidth,
                         terra::ymin(group_extent) - bandwidth,
                         terra::ymax(group_extent) + bandwidth)

    # crop ext of each rast
    grid_crop = terra::crop(grid_rast, new_group_extent)

    kde_rast = spat_kde(data_i, grid_crop, bandwidth)
  } else {
    kde_rast =  spat_kde(data_i, grid_rast, bandwidth)
  }
  ### UNCOMMENT IF EVERY RAST SHOULD HAVE SEPERATE EXTENT


  ### UNCOMMENT IF EVERY RAST SHOULD HAVE SAME EXTENT
  #kde_rast = spat_kde(data_i, grid_rast, bandwidth)
  ### UNCOMMENT IF EVERY RAST SHOULD HAVE SAME EXTENT
  
  # normalization without groups
  if (normalize == TRUE && (norm_group == FALSE || norm_method != "range" || length(data_iter) == 1)){
    if (norm_method != "range" && norm_group == TRUE) {
      message(paste0('Norm_method is "', norm_method, '" - norm_group is TRUE is applicable only for norm_method "range". Norm group argument ignored. Normalizing each group seperately'))
    }
    # calculate normalization
    terra::values(kde_rast) = BBmisc::normalize(terra::values(kde_rast),
                                                method = norm_method, margin = 2L)
  }

  kde_rast = terra::subst(kde_rast, from = 0, to = NA)

  ### UNCOMMENT IF EVERY RAST SHOULD BE A SEPERATE ELEMENT IN LIST
  act_out[length(act_out) + 1] = as.list(kde_rast)
  ### UNCOMMENT IF EVERY RAST SHOULD BE A SEPERATE ELEMENT IN LIST

  ### UNCOMMENT IF ALL RAST AS STACK RASTER
  #act_out = append(act_out, kde_rast)
  ### UNCOMMENT IF ALL RAST AS STACK RASTER
}

if (normalize == TRUE && norm_method == "range" && norm_group == TRUE && length(data_iter) > 1){

  max_val = max(sapply(act_out, terra::minmax)[2,], na.rm = TRUE)

  act_out = sapply(act_out, function(x) {
    if (!(all(is.na(terra::values(x))))) {
      terra::values(x) = BBmisc::normalize(terra::values(x), method = "range",
                                           margin = 2L,
                                           range = c(0, terra::minmax(x)[2] / max_val))
    }
    return(x)
  })
}
Figure 5: PO, KDE and DR activity space for all days
Figure 6: PO, KDE and DR activity space for 20th August 2011, 21st August 2011 and 23rd August 2011
count area min max range mean std sum
PO 1091 2729650.5 1.0000000 25.0000000 24.0000000 1.5609533 1.9528579 1703.0000000
KDE 19473 48720776.9 0.0000000 0.0019976 0.0019976 0.0000350 0.0001175 0.6811803
DR 6858 17158502.8 0.0258368 0.9994128 0.9735760 0.2210858 0.1874312 1516.2066941
PO_20 145 362787.4 1.0000000 15.0000000 14.0000000 1.9586207 1.9329301 284.0000000
KDE_20 2092 5234140.3 0.0000000 0.0015232 0.0015232 0.0000562 0.0001512 0.1175879
DR_20 622 1556233.0 0.0816327 0.9591837 0.8775510 0.2381827 0.1748179 148.1496599
PO_21 157 392811.8 1.0000000 14.0000000 13.0000000 1.4585987 1.3472806 229.0000000
KDE_21 1980 4953934.1 0.0000000 0.0006602 0.0006602 0.0000469 0.0000750 0.0928017
DR_21 834 2086656.6 0.0043103 0.9612069 0.9568966 0.2755830 0.2262931 229.8362069
PO_23 105 262708.7 1.0000000 6.0000000 5.0000000 1.4095238 0.8804246 148.0000000
KDE_23 1344 3362671.0 0.0000000 0.0005180 0.0005180 0.0000452 0.0000699 0.0607986
DR_23 629 1573749.9 0.0328947 0.9868421 0.9539474 0.2575412 0.2138723 161.9934211
Table 2: Table of statistics for PO, KDE and DR for all days, 20th August 2011, 21st August 2011 and 23rd August 2011

4.7.2 Groups normalization

With implementation of group split argument two seperate groups normalization approaches are derived. First, normalization without groups is performed within activity space loop. It is applicable to all normalization methods. Each SpatRaster undergoes individual normalization using BBmisc:normalize(). Secondly, normalization with groups is conducted following activity space loop’s completion. SpatRaster with highest maximum value is rescaled to 0-1 range. Residual SpatRasters are rescaled in regards of prior SpatRaster. The latter approach is executed solely utilizing range normalization method. Therefore, whilst normalization method is distinct from range, the former is conducted instead.

  if (normalize == TRUE && (norm_group == FALSE || norm_method != "range" || length(data_iter) == 1)){
    if (norm_method != "range" && norm_group == TRUE) {
      message(paste0('Norm_method is "', norm_method, '" - norm_group is TRUE is applicable only for norm_method "range". Norm group argument ignored. Normalizing each group seperately'))
    }
    # calculate normalization
    terra::values(kde_rast) = BBmisc::normalize(terra::values(kde_rast),
                                                method = norm_method, margin = 2L)
  }
Figure 7: Normalization without groups for PO, KDE and DR activity space for 20th August 2011, 21st August 2011 and 23rd August 2011
min max range mean std sum
PO_f_20 0.0000000 1 1.0000000 0.0684729 0.1380664 9.928571
KDE_f_20 0.0000000 1 1.0000000 0.0369021 0.0992412 77.199191
DR_f_20 0.0851064 1 0.9148936 0.2483182 0.1822570 154.453901
PO_f_21 0.0000000 1 1.0000000 0.0352768 0.1036370 5.538462
KDE_f_21 0.0000001 1 0.9999999 0.0709961 0.1136578 140.572211
DR_f_21 0.0044843 1 0.9955157 0.2867052 0.2354260 239.112108
PO_f_23 0.0000000 1 1.0000000 0.0819048 0.1760849 8.600000
KDE_f_23 0.0000002 1 0.9999998 0.0873281 0.1349004 117.368978
DR_f_23 0.0333333 1 0.9666667 0.2609751 0.2167239 164.153333
Table 3: Table of statistics for normalization without groups for PO, KDE and DR for 20th August 2011, 21st August 2011 and 23rd August 2011
if (normalize == TRUE && norm_method == "range" && norm_group == TRUE && length(data_iter) > 1){

  max_val = max(sapply(act_out, terra::minmax)[2,], na.rm = TRUE)

  act_out = sapply(act_out, function(x) {
    if (!(all(is.na(terra::values(x))))) {
      terra::values(x) = BBmisc::normalize(terra::values(x), method = "range",
                                           margin = 2L,
                                           range = c(0, terra::minmax(x)[2] / max_val))
    }
    return(x)
  })
}
Figure 8: Normalization with groups for PO, KDE and DR activity space for 20th August 2011, 21st August 2011 and 23rd August 2011
min max range mean std sum
PO_t_20 0 0.8823529 0.8823529 0.0604173 0.1218233 8.760504
KDE_t_20 0 1.0000000 1.0000000 0.0369021 0.0992412 77.199187
DR_t_20 0 0.9591837 0.9591837 0.1711129 0.1910801 106.432210
PO_t_21 0 0.8235294 0.8235294 0.0290515 0.0853481 4.561086
KDE_t_21 0 0.4334178 0.4334178 0.0307709 0.0492613 60.926389
DR_t_21 0 0.9612069 0.9612069 0.2724946 0.2273124 227.260485
PO_t_23 0 0.3529412 0.3529412 0.0289076 0.0621476 3.035294
KDE_t_23 0 0.3400875 0.3400875 0.0296991 0.0458780 39.915647
DR_t_23 0 0.9868421 0.9868421 0.2323929 0.2212472 146.175136
Table 4: Table of statistics for normalization with groups for PO, KDE and DR for 20th August 2011, 21st August 2011 and 23rd August 2011

4.7.3 Line segment

In the beginning, two empty lists destined to store loops results are generated. Selected extent and output option will be included as Section 6.1 discussion is resolved. Line Segment activity space loop is disparate from PO/KDE/DR loop. Line Segment method’s output is a list of buffer SpatVectors data. Additionally, it follows specific workflow Section 4.8.4. Here, time elapsed is calculated from time data and stored as attribute in buffer SpatVector. Line Segment method offers two ways of normalization (time_only_data argument) further discussed in Section 6.2.

buff_list = list()
output = list()

for (data_i in data_iter) {
  # create line segments from spatial points
  trajectories = terra::vect(trajectories_fun(data_i))

  # create buffers over line segments
  traj_buff = trajectories |>
    tidyterra::select(line_id) |>
    terra::buffer(bandwidth)

  traj_buff$act = 1

  if (!is.null(time_data)){
    #time_data_null = dplyr::enquo(time_data)
    duration_line_id = data_i |>
      dplyr::mutate(time_elapsed = as.numeric(difftime(dplyr::lead(.data[[time_cname]]),
                                                       .data[[time_cname]], units = time_unit)),
                    line_id = dplyr::row_number()) |>
      dplyr::select(line_id, time_elapsed)
    
      # last NA value set to 0 for normalization to not set lowest time value to 0
      duration_line_id$time_elapsed[nrow(duration_line_id)] = 0

    #1st type of normalize - normalize time_elapsed without spatial reference
    if(normalize == TRUE && time_only_norm == TRUE &&
       (norm_group == FALSE || norm_method != "range" || length(data_iter) == 1)) {

      if (norm_method != "range" && norm_group == TRUE) {
        message(paste0('Norm_method is "', norm_method, '" - norm_group is TRUE is applicable only for norm_method "range". Norm group argument ignored. Normalizing each group seperately'))
      }

      duration_line_id$time_elapsed = BBmisc::normalize(duration_line_id$time_elapsed, method = norm_method, margin = 1L)
    }
    
    duration_df = as.data.frame(duration_line_id)

    traj_buff = dplyr::left_join(traj_buff, duration_df, by = 'line_id')
    traj_buff = traj_buff[,-2]
    names(traj_buff)[2] = "act"
  }

  if(normalize == TRUE && (time_only_norm == FALSE || is.null(time_data)) &&
     (norm_group == FALSE || length(data_iter) == 1)){
    #2nd type of normalize - normalize time_elapsed or buffers using spatial reference - rasterizing and reading min max data from raster

      if (norm_method != "range") {
        message(paste0('Norm_method is "', norm_method, '" - time_only_norm is FALSE or time_data is missing. This type of time_only normalization is applicable only for norm_method "range". Setting norm_method to "range"'))
      }

    # slightly varying output values (max very close to 1)

    norm_rast = terra::rasterize(traj_buff, grid_rast, field = "act", fun = "sum")
    #read max val
    max_norm = terra::minmax(norm_rast)[2]

    # normalize
    if (length(unique(traj_buff$act)) == 1){
      #if vector is constant adjust BBmisc
      traj_buff$act = BBmisc::normalize(traj_buff$act, method = "range",
                                        margin = 1L, range = c(0, max(traj_buff$act) / max_norm * 2))
    } else{
      traj_buff$act = BBmisc::normalize(traj_buff$act, method = "range",
                                        margin = 1L, range = c(0, max(traj_buff$act) / max_norm))
    }
  }
  
  buff_list[length(buff_list) + 1] = list(traj_buff)
}
Figure 9: Activity space for Line Segment method with and without time_data

Line Segment group normalization is conducted analogously to PO, KDE and DR activity space loop, with the exception of Line Segment adjustment. Rather than normalizing SpatRaster values, time elapsed attribute is processed.

if(normalize == TRUE && time_only_norm == TRUE && !missing(time_data) &&
   norm_group == TRUE && length(data_iter) > 1 && norm_method == "range"){
  # 1st normalize variant for all rasts


  if (length(data_iter) > 1){

    max_buff = (do.call(rbind, buff_list))$act |> max(na.rm = TRUE)
    buff_list = sapply(buff_list, function(x) {
      if (!(all(is.na(x$act)))) {
        x$act = BBmisc::normalize(x$act, method = "range",
                                             margin = 2L,
                                             range = c(0, max(x$act, na.rm = TRUE) / max_buff))
      }
      return(x)
    })

  } else {

    max_buff = buff_list[[1]]$act |> max(na.rm = TRUE)
    buff_list = sapply(buff_list, function(x) {
      if (!(all(is.na(x$act)))) {
        x$act = BBmisc::normalize(x$act, method = "range",
                                             margin = 2L,
                                             range = c(0, max(x$act, na.rm = TRUE) / max_buff))
      }
      return(x)
    })
  }
} else if (normalize == TRUE && (time_only_norm == FALSE || missing(time_data)) &&
           norm_group == TRUE && length(data_iter) > 1 && norm_method == "range"){
  # 2nd normalize variant for all rasts

  norm_list = sapply(buff_list, terra::rasterize, grid_rast, field = "act", fun = "sum")
  max_multiple_norm = max(sapply(norm_list, terra::minmax)[2,], na.rm = TRUE)
  buff_list = sapply(buff_list, function(x) {
    if (!(all(is.na(x$act)))) {
      x$act = BBmisc::normalize(x$act, method = "range",
                                margin = 2L,
                                range = c(0, max(x$act, na.rm = TRUE) / max_multiple_norm))
    }
    return(x)
  })
}

4.8 Method specific calculations

4.8.1 Point Overlay

Activity space of Point Overlay method is evaluated by converting points vector data to raster using length function.

rast_points = terra::rasterize(data_i, grid_rast, fun = "length")

4.8.2 Kernel Density Estimation

The spat_kde() implemented in exposure_KDE() utilizes kde() function. GPS SpatVector points (x), grid rast (ref) and bandwidth (bw) are processed for the purpose of computing Kernel Density Estimation SpatRaster.

Listing 4: spat_kde function
Code
spat_kde = function(x, ref, bw){

  # coords of each ref cell center
  gx = terra::crds(ref)[,1] |> unique()
  gy = terra::crds(ref)[,2] |> unique()

  # distance from each x/y ref cell center to each x/y x points (squared)

  kde_val = kde(x, gx, gy, bw)

  output_list = list(x = gx, y = gy, z = kde_val)

  # create df of x/y coords and val
  pts = data.frame(expand.grid(x = output_list$x, y = output_list$y),
                   z = as.vector(array(output_list$z,  length(output_list$z))))
  # create points SpatVector
  pts = terra::vect(pts, geom = c("x", "y"))
  # rasterize to ref
  return(terra::rasterize(pts, ref, field = "z"))
}

The kde() function produces KDE exposure matrix from SpatVector points, two numeric vectors indicating columns and rows coordinates of SpatRaster and bandwidth. With the intention of optimazing memory efficiency, computation of each column is performed seperately.

Listing 5: kde function
Code
kde = function(points, ref_uq_x, ref_uq_y, bw) {
  ax <- outer(ref_uq_x, terra::crds(points)[, 1], "-" ) ^ 2
  ay <- outer(ref_uq_y, terra::crds(points)[, 2], "-" ) ^ 2

  # points within quadratic search radius
  ax_T = ax <= bw ^ 2
  ay_T = ay <= bw ^ 2

  # every x row index
  positions = matrix(1:nrow(ax), ncol = 1)

  # calculate KDE for each column seperately

  density_mx = apply(positions,  1, function(xcol) {

    # which point's x coord within possible search radius
    cols_T = which(ax_T[xcol,] == TRUE)

    # matrix cell coordinates of points within quadratic search radius
    rows_T = which(ay_T[,cols_T] == TRUE, arr.ind = TRUE)

    if (length(cols_T) <= 1){ # if only one or zero points within quadratic search radius
      suppressWarnings({rows_T = rows_T |> array(dim = c(length(rows_T), 1)) |> cbind(1)})
      colnames(rows_T) = c("row", "col")
    }
    # calculate distance within quadratic search radius
    df_row = rows_T |> as.data.frame() |>
      dplyr::mutate(sum_val = ay[cbind(row, cols_T[col])] + ax[cbind(xcol, cols_T[col])]) |>
      dplyr::filter(sum_val <= bw ^ 2) |> # filter points within search radius based on real distance
      dplyr::mutate(sum_val = (sum_val / bw ^ 2 * (-1) + 1) ^ 2) # calculate impact of each point on cells

    #create empty matrix
    empty_mx = matrix(NA, nrow = nrow(ay), ncol = length(cols_T))
    # insert values based on cell coordinates
    empty_mx[cbind(df_row$row, df_row$col)] = df_row$sum_val

    # sum impact of every point to calculate KDE
    mx_sum = rowSums(empty_mx, na.rm = TRUE)

    return(mx_sum)})

  # transpose matrix and final KDE calculations
  out = t(density_mx) * (3/pi) / bw ^ 2
  return(out)
}

Calculations are conducted on seven sample GPS Geolife Data SpatVector points located in center of San Diego and cropped to sample GPS data extent 5 x 5 grid raster.

Listing 6: Example kde data
#example ref grid and points
exp_point = geolife_sandiego |> filter(dateTime %in% c("2011-08-18 02:27:11", "2011-08-19 04:24:54", "2011-08-21 03:57:00", "2011-08-18 03:16:43", "2011-08-18 01:26:31", "2011-08-21 04:24:11", "2011-08-21 03:03:14")) |> vect(geom = c("lon", "lat"), crs = "EPSG:4326") |> project(crs(data_proj))
ref_exp = crop(grid_rast, ext(exp_point))
Figure 10: Example data

X coordinates matrix

Example points coordinates
x y
484833.2 3619702
484817.4 3619649
484957.7 3619852
484988.0 3619654
484856.3 3619633
484761.3 3619643
484884.0 3619700

\begin{bmatrix} 484763 &484813 &484863 &484913 &484963 \\484763 &484813 &484863 &484913 &484963 \\484763 &484813 &484863 &484913 &484963 \\484763 &484813 &484863 &484913 &484963 \\484763 &484813 &484863 &484913 &484963 \\ \end{bmatrix}

Y coordinates matrix

\begin{bmatrix} 3619850 &3619850 &3619850 &3619850 &3619850 \\3619800 &3619800 &3619800 &3619800 &3619800 \\3619750 &3619750 &3619750 &3619750 &3619750 \\3619700 &3619700 &3619700 &3619700 &3619700 \\3619650 &3619650 &3619650 &3619650 &3619650 \\ \end{bmatrix}

Column and rows coordinates are retrieved from ref SpatRaster. Subsequently, distance matrixes from every SpatVector point to each column and row is computed.

#points to row and col distance matrixes
gx = terra::crds(ref_exp)[,1] |> unique()
gy = terra::crds(ref_exp)[,2] |> unique()

# distance matrixes
ax = abs(outer(gx, terra::crds(exp_point)[, 1], "-" ))
ay = abs(outer(gy, terra::crds(exp_point)[, 2], "-" ))

# used in further calculation
ax <- outer(gx, terra::crds(exp_point)[, 1], "-" ) ^ 2
ay <- outer(gy, terra::crds(exp_point)[, 2], "-" ) ^ 2

Distance matrix from SpatVector points to columns

\begin{bmatrix} 70 &54 &195 &225 &93 &2 &121 \\20 &4 &145 &175 &43 &52 &71 \\30 &46 &95 &125 &7 &102 &21 \\80 &96 &45 &75 &57 &152 &29 \\130 &146 &5 &25 &107 &202 &79 \\ \end{bmatrix}

Distance matrix from SpatVector points to rows

\begin{bmatrix} 148 &201 &2 &196 &217 &207 &150 \\98 &151 &52 &146 &167 &157 &100 \\48 &101 &102 &96 &117 &107 &50 \\2 &51 &152 &46 &67 &57 &0 \\52 &1 &202 &4 &17 &7 &50 \\ \end{bmatrix}

SpatVector points within search radius (bandwidth) are indicated. Based on distance matrix boolean matrix is derived.

# matrixes with points within search radius T/F
ax_T = ax <= 100 
ay_T = ay <= 100 

# used in further calculation
ax_T = ax <= 100 ^ 2
ay_T = ay <= 100 ^ 2

Search radius of columns matrix

\begin{bmatrix} TRUE &TRUE &FALSE &FALSE &TRUE &TRUE &FALSE \\TRUE &TRUE &FALSE &FALSE &TRUE &TRUE &TRUE \\TRUE &TRUE &TRUE &FALSE &TRUE &FALSE &TRUE \\TRUE &TRUE &TRUE &TRUE &TRUE &FALSE &TRUE \\FALSE &FALSE &TRUE &TRUE &FALSE &FALSE &TRUE \\ \end{bmatrix}

Search radius of rows matrix

\begin{bmatrix} FALSE &FALSE &TRUE &FALSE &FALSE &FALSE &FALSE \\TRUE &FALSE &TRUE &FALSE &FALSE &FALSE &FALSE \\TRUE &FALSE &FALSE &TRUE &FALSE &FALSE &TRUE \\TRUE &TRUE &FALSE &TRUE &TRUE &TRUE &TRUE \\TRUE &TRUE &FALSE &TRUE &TRUE &TRUE &TRUE \\ \end{bmatrix}

Calculations are performed on every column seperately. Retrieve points within search radius of current column and all rows and generate matrix of retrieved points coordinates in rows boolean matrix.

Points within row and column search radius
row col sum_val
2 1 120.624429
3 1 85.065152
4 1 70.156307
5 1 87.222329
4 2 74.716808
5 2 54.413731
4 3 114.849433
5 3 94.808043
4 4 56.811929
5 4 7.005758
xcol = 1

# select points with x coord within possible search radius to single column
cols_T = which(ax_T[xcol, ] == TRUE)
## [1] 1 2 5 6

# matrix cell coordinates of points within quadratic search radius
rows_T = which(ay_T[,cols_T] == TRUE, arr.ind = TRUE)

Compute euclidean distance and indicate points within euclidean search radius. Points beyond search radius are removed.

Points within euclidean search radius
row col sum_val
3 1 85.065152
4 1 70.156307
5 1 87.222329
4 2 74.716808
5 2 54.413731
5 3 94.808043
4 4 56.811929
5 4 7.005758
dist = rows_T |> as.data.frame() |>
      dplyr::mutate(sum_val = ay[cbind(row, cols_T[col])] + ax[cbind(xcol, cols_T[col])])

#filter search radius again
dist_filt = dist |> dplyr::filter(sum_val <= 100 ^ 2)

Further, impact of selected points on value of each cell is evaluated.

# kde calculations
df_row = dist_filt |> dplyr::mutate(sum_val = (sum_val / 100 ^ 2 * (-1) + 1) ^ 2)
Impact of points on each cell value
row col sum_val
3 1 0.0763925
4 1 0.2578702
5 1 0.0572293
4 2 0.1951341
5 2 0.4954957
5 3 0.0102300
4 4 0.4586547
5 4 0.9902080

Empty matrix with ay number of rows and indicated points number of columns is generated. Point impact values are inserted into empty matrix.

#create empty matrix
empty_mx = matrix(NA, nrow = nrow(ay), ncol = length(cols_T))
# insert values based on cell coordinates
empty_mx[cbind(df_row$row, df_row$col)] = df_row$sum_val

Matrix of every column impacting point

\begin{bmatrix} NA &NA &NA &NA \\NA &NA &NA &NA \\0.08 &NA &NA &NA \\0.26 &0.2 &NA &0.46 \\0.06 &0.5 &0.01 &0.99 \\ \end{bmatrix}

Rows of matrix are summed to evaluate entire points impact on each cell.

# sum all points to calc cell value
mx_sum = rowSums(empty_mx, na.rm = TRUE)

\begin{bmatrix} 0 \\0 \\0.076 \\0.912 \\1.553 \\ \end{bmatrix}

Evaluation is repeated for every column in ref SpatRaster computing values for all cells.

\begin{bmatrix} 0 &0 &0.01 &0.64 &0.99 \\0 &0 &0 &0.28 &0.53 \\0.08 &0.59 &0.96 &0.46 &0.02 \\0.91 &2.01 &2.32 &1.07 &0.67 \\1.55 &2.68 &2.47 &1.07 &0.89 \\ \end{bmatrix}

Final KDE calculations are performed and output matrix is returned.

out_kde = t(density_mx) * (3/pi) / 100 ^ 2

\begin{bmatrix} 0 &0 &0.000001 &0.000061 &0.000095 \\0 &0 &0 &0.000027 &0.00005 \\0.000007 &0.000056 &0.000091 &0.000044 &0.000002 \\0.000087 &0.000192 &0.000222 &0.000102 &0.000064 \\0.000148 &0.000256 &0.000236 &0.000102 &0.000085 \\ \end{bmatrix}

Matrix values are converted into ref SpatRaster.

output_list = list(x = gx, y = gy, z = out_kde)

# create df of x/y coords and val
pts = data.frame(expand.grid(x = output_list$x, y = output_list$y),
                 z = as.vector(array(output_list$z,  length(output_list$z))))
# create points SpatVector
pts = terra::vect(pts, geom = c("x", "y"))
# rasterize to ref
out_rast = terra::rasterize(pts, ref_exp, field = "z")
out_rast = terra::subst(out_rast, from = 0, to = NA)
Figure 11: Kernel Density Estimation for example data

4.8.3 Density Ranking

The spat_dr() function implemented in exposure_DR() uses kde() and kde_points() functions to process GPS points (x), grid raster (ref) and bandwidth (bw) into Density Ranking SpatRaster. It utilizes stats::ecdf() to calculate DR from KDE ref raster values and KDE points values.

Listing 7: spat_dr function
Code
spat_dr = function(x, ref, bw) {
  gx = terra::crds(ref)[,1] |> unique()
  gy = terra::crds(ref)[,2] |> unique()

  # distance from each x/y ref cell center to each x/y x points (squared)

  kde_val = kde(x, gx, gy, bw)
  dr_val = kde_points(terra::crds(x)[,1], terra::crds(x)[,2], bw)
  # DR EVAL
  dr_calc = stats::ecdf(dr_val)(as.vector(kde_val))

  output_list = list(x = gx, y = gy, z = dr_calc)

  # create df of x/y coords and val
  pts = data.frame(expand.grid(x = output_list$x, y = output_list$y),
                   z = output_list$z)
  # create points SpatVector
  pts = terra::vect(pts, geom = c("x", "y"))
  # rasterize to ref
  return(terra::rasterize(pts, ref, field = "z"))
}

The kde_points() calculates KDE of every SpatVector point (instead of cells as in kde()). Function utilizes two vectors of SpatVector point’s x and y coordinates and bandwidth. Similarly to kde(), kde_points() computation is segmented into seperate points to optimize memory efficiency. Calculations are executed on example data produced beforehand in Listing 6.

Listing 8: kde_points function
Code
kde_points = function(vect_x, vect_y, bw){
  # apply for each cell coords seperately
  kde_p = mapply(function(x, y) {
    #distance betweeen each point
    ax_p = (vect_x - x) ^ 2
    ay_p = (vect_y - y) ^ 2

    # boolean if dist within search radius
    ax_p_T = ax_p <= bw ^ 2
    ay_p_T = ay_p <= bw ^ 2

    # choose dist^ 2 within search radius
    x_T = which(ax_p_T == TRUE)
    y_T = which(ay_p_T == TRUE)

    # index of coords of which both x and y within search radius
    xy_T = intersect(x_T, y_T)

    # calculate dist ^ 2 and choose thoso within search radius
    xy_val = ax_p[xy_T] + ay_p[xy_T]
    xy_val = xy_val[xy_val < bw ^ 2]

    #
    xy_calc = (xy_val / bw ^ 2 * (-1) + 1) ^ 2

    xy_out = sum(xy_calc)


  }, x = vect_x, y =  vect_y) * (3/pi) / bw ^ 2
  
  return(kde_p)
}
# same example points as in kde()

vect_x = terra::crds(exp_point)[,1]
## [1] 484833.2 484817.4 484957.7 484988.0 484856.3 484761.3 484884.0

vect_y = terra::crds(exp_point)[,2]
## [1] 3619702 3619649 3619852 3619654 3619633 3619643 3619700

First pair of coordinates is retrieved.

xi = vect_x[1]
## [1] 484833.2

yi = vect_y[1]
## [1] 3619702

Initially, distance between current point and each point’s x and y coordinates is derived.

# distance between each point's row and column

abs(vect_x - xi)
## [1]   0.00000  15.73158 124.58766 154.83885  23.13644  71.87595  50.88448

abs(vect_y - yi)
## [1]   0.000000  53.075746 150.130624  48.124734  68.876543  58.643418   1.962094

# used in further calculation
ax_p = (vect_x - xi) ^ 2
ay_p = (vect_y - yi) ^ 2

Indicate points within search radius (bandwidth). Based on point distance vector boolean vector is derived.

# points within search radius in x and y

ax_p_T = ax_p <= 100 ^ 2
## [1]  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE

ay_p_T = ay_p <= 100 ^ 2
## [1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE

Afterwards, retrieve points within search radius of current point x and y coordinates and generate vector of retrieved points indexes in x and y coordinates vectors.

# intersect x and y to get indexes of point coords within search radius

x_T = which(ax_p_T == TRUE)
## [1] 1 2 5 6 7

y_T = which(ay_p_T == TRUE)
## [1] 1 2 4 5 6 7

xy_T = intersect(x_T, y_T)
## [1] 1 2 5 6 7

Compute euclidean distance and indicate points within euclidean search radius. Points beyond search radius are removed.

# calc dist and search radius filter again
xy_val = sqrt(ax_p[xy_T] + ay_p[xy_T])
## [1]  0.00000 55.35808 72.65860 92.76423 50.92230

xy_val[xy_val < 100]
## [1]  0.00000 55.35808 72.65860 92.76423 50.92230

# used in further calculation
xy_val = ax_p[xy_T] + ay_p[xy_T]
xy_val = xy_val[xy_val < 100 ^ 2]

Process impact of points calculations and sum all values to compute entire points impact on current point.

# additional calc and sum all rows 
xy_calc = (xy_val / 100 ^ 2 * (-1) + 1) ^ 2
## [1] 1.00000000 0.48100918 0.22285265 0.01945458 0.54862461

xy_out = sum(xy_calc)
## [1] 2.271941

Evaluation is repeated for all points and final KDE calculations are performed.

kde_p
## [1] 2.271941 2.711617 1.000000 1.000000 2.135161 1.491677 1.861494
kde_p_out = kde_p * (3/pi) / 100 ^ 2
## [1] 0.00022 0.00026 0.00010 0.00010 0.00020 0.00014 0.00018

Formerly computated grid KDE and point KDE are utilized in Density Ranking using Empiracal Cumulative Distribution Function, ranking KDE values.

dr_calc = stats::ecdf(kde_p_out)(as.vector(out_kde))
##  [1] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.57 0.86 0.29 0.00
## [21] 0.43 0.86 0.86 0.29 0.00

\begin{bmatrix} 0 &0 &0 &0 &0 \\0 &0 &0 &0 &0 \\0 &0 &0 &0 &0 \\0 &0.57 &0.86 &0.29 &0 \\0.43 &0.86 &0.86 &0.29 &0 \\ \end{bmatrix}

Vector values are converted into ref SpatRaster.

output_dr_list = list(x = gx, y = gy, z = dr_calc)

# create df of x/y coords and val
pts_dr = data.frame(expand.grid(x = output_dr_list$x, y = output_dr_list$y),
                 z = output_dr_list$z)
# create points SpatVector
pts_dr = terra::vect(pts_dr, geom = c("x", "y"))
# rasterize to ref
dr_out_rast = terra::rasterize(pts_dr, ref_exp, field = "z")
dr_out_rast = terra::subst(dr_out_rast, from = 0, to = NA)
Figure 12: Density Ranking for example data

4.8.4 Line Segment

The expoosure_LS() function applies trajectories_fun() to evaluate trajectories from every two GPS data SpatVector points (ordered).

Listing 9: tracjectories_fun function
#Line segment trajectories

trajectories_fun = function(data){

  t_name = c(names(data))[grepl('time', c(names(data)))]
  lst_name = paste(t_name[which.max(nchar(t_name))], "_1", sep = "")

  trajectories_out = data |>
    sf::st_as_sf() |> # change to sf object to change points to linestring later
    #filter to the study id defined by the argument
    #define start and end points of line
    dplyr::mutate(
      line_id = dplyr::row_number(),#an id for each "line segment"
      x_start= sf::st_coordinates(geometry)[,1],
      y_start= sf::st_coordinates(geometry)[,2],
      x_end = dplyr::lead(x_start),
      y_end = dplyr::lead(y_start)
    ) |>
    dplyr::ungroup() |>
    sf::st_set_geometry(NULL) |>
    #exclude the last observation, which has no "lead", and will be missing.
    dplyr::filter(is.na(x_end)==FALSE) |>
    # Make the data long form so that each point has two observations
    tidyr::pivot_longer(
      # select variables to pivot longer.
      cols = c(x_start, y_start, x_end, y_end),
      #value goes to "x/y", and time goes to "_start/end"
      names_to = c(".value", lst_name),
      names_repair = "unique",
      names_sep = "_"#the separator for the column name
    ) |>
    # create sf object once again
    sf::st_as_sf(coords = c("x", "y"), crs= sf::st_crs(data)) |>
    dplyr::group_by(line_id) |>
    #see Edzer's answer here:https://github.com/r-spatial/sf/issues/851
    #do_union=FALSE is needed.
    dplyr::summarize(do_union = FALSE) |>
    sf::st_cast("LINESTRING") |> # cast linestring type
    sf::st_as_sf() |>
    dplyr::ungroup()

  return(trajectories_out)
}

Subsequently, buffer around all trajectories is evaluated, later applied in normalization and output SpatRaster calculations.

# create line segments from spatial points
trajectories = terra::vect(trajectories_fun(data_i))

# create buffers over line segments
traj_buff = trajectories |>
  tidyterra::select(line_id) |>
  terra::buffer(bandwidth)

traj_buff$act = 1
Figure 13: Line Segment buffers with trajectories

4.9 Environmental exposure loop

4.9.1 Point Overlay/Kernel Density Estimation/Density Ranking

Second, environmental exposure, loop for calculating environmental exposure is implemented. Environmental exposure layer is computed by multiplying activity space layer and environmental data layer.

for (out in act_out) {
  if (!is.null(env_data)){ # calculate exposure
    # project env_data to grid
    env_data_proj = terra::project(env_data, out)
    # if same ext for groups out -> grid_rast and this code chunk out of loop

    #env_data_resamp = terra::resample(env_data_proj, out)

    rast_env_kde = out * env_data_proj
    end_rast = rast_env_kde
  } else {
    end_rast = out
  }
  if (length(data_iter) == 1) { # if one group or no group output is not a list
    output = end_rast
  } else {
    output[length(output) + 1] = as.list(end_rast)
  }
}

4.9.2 Line Segment

Computing environmental exposure layer in Line Segment method utilizes grid raster with assigned environmental data values. Grid raster is extracted to every SpatVector trajectory buffer, converted to one data frame and summarized by area overlap for each segment afterwards. Summarized exposure data is joined with SpatVector trajectory buffer and multiplied by normalized or not normalized time weights. Finally, vector data is converted to raster based on exposure and time weights with sum function. Activity space calculation bypasses extraction of environmental values and rasterizes time weights to grid raster using sum function.

for (buff in buff_list) {

  if (!missing(env_data)){ #all of computation necessary only when calculating env_data
    # extract values and weights (area overlap) from grid for each cell of buffer
    exac_extr = exactextractr::exact_extract(grid_rast, sf::st_as_sf(buff), progress = FALSE)

    traj_extract = Map(function(x, id) {x$line_id = id
    return(x)}, x = exac_extr, seq_along(exac_extr)) |> dplyr::bind_rows() |>
      dplyr::as_tibble() |>
      dplyr::relocate(line_id) |>
      dplyr::rename(e = 2)

    # calculate summarised exposure for each buffer
    traj_extract_line_id = traj_extract |>
      dplyr::group_by(line_id) |>
      dplyr::summarise(
        #Jan 9, 2024 use R's built-in weighted.mean() function
        #instead of calculating weighted average manually
        end_weights=stats::weighted.mean(
          x=e,
          w=coverage_fraction,
          na.rm=TRUE),
        #These weights are based on the areal overlap, not time
        sum_weights = sum(coverage_fraction,na.rm=TRUE),
        n_pixel = dplyr::n() # number of observations corresponds to number of pixels per line segment
      ) |>
      dplyr::ungroup()

    # join weights with spatial buffer
    weight_buff = dplyr::inner_join(buff, traj_extract_line_id, by = 'line_id')

    weight_buff$end_weights = weight_buff$end_weights * weight_buff$act

    # rasterize results
    rast_segment = terra::rasterize(weight_buff, grid_rast, field = "end_weights", fun = "sum")
  } else {
    rast_segment = terra::rasterize(buff, grid_rast, field = "act", fun = "sum")
  }
  if (length(data_iter) == 1) { # if one group or no group output is not a list
    output = rast_segment
  } else {
    output[length(output) + 1] = as.list(rast_segment)
  }
}
line_id e coverage_fraction
1 0.1509941 0.2779211
1 0.3074401 0.8561569
1 0.2196989 1.0000000
1 0.2476714 1.0000000
1 0.2691399 0.8558987
1 0.2917317 0.2785110
1 0.0974092 0.2394574
1 0.0545777 0.9830808
1 0.1115729 1.0000000
1 0.2017748 1.0000000
Table 5: traj_extract - Extracted values for each raster cell
line_id end_weights sum_weights n_pixel
1 0.1818593 104.5210 127
2 0.1825086 107.2911 132
3 0.1844797 107.2188 135
4 0.1678028 105.7782 135
5 0.1756642 109.1286 137
6 0.2133048 111.2900 142
7 0.1915245 109.1738 137
8 0.1610415 110.9404 138
9 0.1730885 110.1454 137
10 0.1926440 103.0547 128
Table 6: traj_extract_line_id - Extracted values for each line segment with weights
Figure 14: Weighted line segments buffers

4.10 Writing to filepath

Splitting SpatVector data imposes adjusting filepaths for writing multiple output SpatRasters. In scenarios where number of groups is greater than 1 unique filepaths based on set path are created.

if (length(data_iter) > 1 && !is.null(filepath)) {   
  # list output 
  
  # for iterating file names   
  file_ext = stringi::stri_extract(filepath, regex = "\\.(\\w+)$")   
  file_no_ext = substr(filepath, 1, nchar(filepath) - nchar(file_ext))
  
  file_vect = rep(file_no_ext, length.out = length(data_iter))   
  file_vect = paste0(file_vect, "_", seq_along(file_vect), file_ext) 
}

# for filepath = "raster.tif"
file_vect
## [1] "raster_1.tif" "raster_2.tif" "raster_3.tif" "raster_4.tif" "raster_5.tif" "raster_6.tif" "raster_7.tif" "raster_8.tif"

Write every output SpatRaster to filepaths defined beforehand.

if (!is.null(filepath)) { # save raster
  if (length(data_iter) > 1){ # update filepath

    mapply(function(x, y) {
      terra::writeRaster(x, filename = y)
      message(paste0("Saving output to ", y))
    }, x = output, y = file_vect)

  } else {

    terra::writeRaster(end_rast, filename = filepath)
    message(paste0("Saving output to ", filepath))
  }
}

4.11 Output

Output of twsagps package functions is a single SpatRaster or a list of activity space or environmental exposure SpatRasters.

Figure 15: Environmental exposure layers for NDVI data
count area min max range mean std sum
PO_ndvi 1091 2729650 -0.4050649 1.0608427 1.4659077 0.1067366 0.1088869 116.4496332
KDE_ndvi 19473 48720777 -0.0000137 0.0002946 0.0003083 0.0000028 0.0000074 0.0549635
DR_ndvi 6858 17158503 -0.0561597 0.1809338 0.2370935 0.0183426 0.0196866 125.7934349
LS_ndvi 20904 52301083 -0.0810223 527.6416016 527.7226238 2.5270035 26.2159897 52824.4801867
Table 7: Table of statistics for all methods and NDVI data
Figure 16: Environmental exposure layers for San Diego bike routes lines data
count area min max range mean std sum
PO_bike 277 693047.3 1.0000000 84.0000000 83.0000000 7.2815884 9.4647429 2017.0000000
KDE_bike 2924 7315749.8 0.0000000 0.0139834 0.0139834 0.0002593 0.0008235 0.7581425
DR_bike 1391 3480242.4 0.0299472 9.0428655 9.0129184 1.2824664 1.3475486 1783.9107457
LS_bike 9803 24526747.7 0.1850707 29764.5585938 29764.3735231 299.7464519 2131.9429532 2938414.4679463
Table 8: Table of statistics for all methods and San Diego bike routes lines data

5 Raster statistics

Function rast_stats() calculates raster statistics utilizing terra::global(), terra::freq() and terra::expanse() functions. Statistics for any number of rasters are apposed in data frame.

Listing 10: rast_stats function
Code
# Statistics calculation function

rast_stats =  function(..., stats, row_names = NULL){

  elipsis_list = list(...)
  el_class = sapply(elipsis_list, class)
  if (!all(el_class == "SpatRaster")){
    if (any(el_class == "SpatRaster")){
      warning("Skipping not SpatRaster class arguments")
      elipsis_list = elipsis_list[el_class == "SpatRaster"]
    } else {
      stop("None of arguments is SpatRaster class")
    }
  }

  if (any(sapply(elipsis_list, function(x) {terra::crs(x)}) == "")  && "area" %in% stats) { # when
    warning("Empty crs of one of arguments - area statistic will not be calculated")
    stats = stats[!stats == "area"]
  }
  if (missing(stats) || length(stats) == 0){
    stop("No statistic to be calculated")
  }

  df = data.frame(matrix(ncol = length(stats)))[-1,]
  for (raster in elipsis_list){

    row_df = data.frame(matrix(NA, ncol=1, nrow=1))[-1]
    for (statistic in stats){

      column = switch(
        statistic,
        range = {range = terra::global(raster, fun = "range", na.rm = TRUE)
        val = range[2] - range[1]
        colnames(val) = "range"
        val},
        count = {raster |> terra::freq() |> dplyr::summarise(count = sum(count))},
        area = {raster |> terra::expanse() |> dplyr::select(area)},
        terra::global(raster, fun = statistic, na.rm = TRUE)
      )
      row_df = cbind(row_df, column)
    }
    df = rbind(df, row_df)
  }

  if (is.null(row_names)){
    el_names = names(elipsis_list)
    names_repl = which(el_names == "")
    el_names[names_repl] = names_repl
    rownames(df) = el_names
  } else if (length(row_names) == length(elipsis_list)) {
    rownames(df) = row_names
  } else {
    warning("Incorrect length of row_names argument - default output row names")
  }

  return(df)
}

Entirety of statistics tables are calculated using rast_stats() (Table 2, Table 3, Table 4, Table 8, Table 7).

6 Package issues

6.1 Extent and output

Splitting SpatVector GPS data introduces issues regarding SpatRaster extent and output format. Whilst considering extent with multiple SpatRaster, it may be adressed doubly. SpatRasters has various extents or all SpatRasters maintain identical extent. First, various extents are calculated by cropping grid raster to the minimal feasible extent, preserving grid raster cellsize and converting SpatVector data to cropped grid raster. It is solely overwritten by grid extent argument. Output of this approach must be a list of SpatRasters. Additionally, it enables avoiding disproportion between broad extents and data with clustered spatial distribution.

# calculate seperate extents

if (missing(grid_extent)) {
  #new ext for each group
  # get extent
  group_extent = terra::ext(data_i)
  # new extent - expanded extent by bandwidth
  new_group_extent = c(terra::xmin(group_extent) - bandwidth,
                       terra::xmax(group_extent) + bandwidth,
                       terra::ymin(group_extent) - bandwidth,
                       terra::ymax(group_extent) + bandwidth)

  # crop ext of each rast
  grid_crop = terra::crop(grid_rast, new_group_extent)
  
  kde_rast = spat_kde(data_i, grid_crop, bandwidth) # method specific calculation
} else {
  kde_rast = spat_kde(data_i, grid_rast, bandwidth) # method specific calculation
}

# add to list
act_out[length(act_out) + 1] = as.list(kde_rast)

Secondly, identical extent across every SpatRaster is accomplished by rasterizing all SpatVectors to grid raster without cropping. Uniform extents allows implementing alternative output type. Instead of output list of several items, output is generated as singular SpatRaster with multiple layers. Second approach proposes significantly more cohesive output, facilitating usage of data and allowing saving it to a single file.

# calculate same extent
kde_rast = spat_kde(data_i, grid_rast, bandwidth) # method specific calculation

# add SpatRaster as another layer
act_out = append(act_out, kde_rast)

# all items iterated
act_out
## class       : SpatRaster 
## dimensions  : 1607, 736, 8  (nrow, ncol, nlyr)
## resolution  : 50, 50  (x, y)
## extent      : 459538, 496338, 3600575, 3680925  (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 11N (EPSG:32611) 
## source(s)   : memory
## names       :         last,         last,         last,         last,         last,         last, ... 
## min values  : 1.081063e-14, 4.887112e-15, 6.871992e-11, 2.675202e-12, 9.057847e-11, 2.405132e-09, ... 
## max values  : 7.885379e-05, 1.000735e-03, 7.455308e-04, 1.523175e-03, 6.601711e-04, 7.408324e-05, ... 

6.2 Line Segment normalization

Due to different workflow of Line Segment method applying activity raster normalization is problematic and cannot be accomplished solely by rescaling activity raster values. Thus two approaches of Line Segment are derived and implemented in time_only_norm argument.

First method of time normalization uses only time elapsed and utilizes BBmisc::normalize() on column of time data. Following computation of time elapsed, data frame last value (which is NA from diff_time() function) is set to 0 for proper normalization values.

# calculate time elapsed from all points
duration_line_id = data_proj |>
      dplyr::mutate(time_elapsed = as.numeric(difftime(dplyr::lead(.data[[time_cname]]), .data[[time_cname]], units = time_unit)),line_id = dplyr::row_number()) |>
      dplyr::select(line_id, time_elapsed)



# max time
max(duration_line_id$time_elapsed, na.rm = TRUE)
## [1] 2272.633

# min time
min(duration_line_id$time_elapsed, na.rm = TRUE)
## [1] 0.1666667

# last NA value set to 0 for normalization to not set lowest time value to 0
duration_line_id$time_elapsed[nrow(duration_line_id)] = 0

# first 50 normalized values
head(round(BBmisc::normalize(duration_line_id$time_elapsed, method = "range", margin = 1), 5), n =50)
##  [1] 0.00007 0.00007 0.00007 0.00007 0.00007 0.00007 0.00007 0.00007 0.00007 0.00007 0.00007
## [12] 0.00007 0.00007 0.00007 0.00007 0.00007 0.00007 0.00007 0.00007 0.00007 0.00007 0.00007
## [23] 0.00007 0.00007 0.00007 0.00007 0.00007 0.00007 0.00007 0.00007 0.00007 0.00007 0.00007
## [34] 0.00007 0.00007 0.00007 0.00007 0.00007 0.00007 0.00007 0.00007 0.00007 0.00007 0.00007
## [45] 0.00007 0.00007 0.00007 0.00007 0.00010 0.00012

Second approach of normalization of Line Segment method is rasterization of SpatVector trajectory buffers. Maximal value from SpatRaster is retrieved and implemented in normalizing buffer SpatVector attribute. It enables normalizing Line Segment method without time data. It is only applicable to range normalization method.

duration_df = as.data.frame(duration_line_id)

trajectories = terra::vect(trajectories_fun(data_proj))

# create buffers over line segments
traj_buff = trajectories |>
  tidyterra::select(line_id) |>
  terra::buffer(bandwidth)

traj_buff$act = 1
traj_buff = dplyr::left_join(traj_buff, duration_df, by = 'line_id')
traj_buff = traj_buff[,-2]
names(traj_buff)[2] = "act"

norm_rast = terra::rasterize(traj_buff, grid_rast, field = "act", fun = "sum")
#read max val
max_norm = terra::minmax(norm_rast)[2]

# normalize
if (length(unique(traj_buff$act)) == 1){
  #if vector is constant adjust BBmisc
  traj_buff$act = BBmisc::normalize(traj_buff$act, method = "range",
                                    margin = 1L, range = c(0, max(traj_buff$act) / max_norm * 2))
} else{
  traj_buff$act = BBmisc::normalize(traj_buff$act, method = "range",
                                    margin = 1L, range = c(0, max(traj_buff$act) / max_norm))
}

# max raster val
minmax(norm_rast)[2]
## [1] 7317.602

# min raster val
minmax(norm_rast)[1]
## [1] 0.1666667

# first 50 normalized values
BBmisc::normalize(traj_buff$act, method = "range", margin = 1L, range = c(0, max(traj_buff$act) / max_norm)) |> round(9) |> head(n=50)
##  [1] 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000
##  [8] 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000
## [15] 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000
## [22] 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000
## [29] 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000
## [36] 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000
## [43] 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000001
## [50] 0.000000002

Latter normalization’s output resembles PO/KDE/DR normalization more than former approach though maximal value after converting SpatVector to SpatRaster is approximate to 1.

# min norm 1
minmax(norm_1)[1]
## [1] 0.00007333636

# max norm 1
minmax(norm_1)[2]
## [1] 3.219877

# min norm 2
minmax(norm_2)[1]
## [1] 0

# max norm 2
minmax(norm_2)[2]
## [1] 0.9964741
Figure 17: Line Segment method - two types of normalization

6.3 Name

Package requires a name more memorable than twsagps. Several new acronyms are proposed:

TWIGPS - Time-Weighted Integrated GPS (Time-Weighted spatial averaging methods Integrated with GPS) - personal favourite

HEXOSIS - Human (or Health) Environmental eXposure Over Space and time with Integrated GPS

HEXGPS - Human (or Health) Environmental eXposure with GPS

GPSHEAL - GPS-based Health and Environmental Assessment Locator

7 Further development

The BBmisc::normalize() and terra::writeRaster() functions possess great amount of not applied arguments that the user may consider utilizing. Incorporating those arguments as elipsis in exposure functions may be beneficial.

The spat_kde() and spat_dr() (along with kde() and kde_points()) functions features should be extented. Establishing those functions as standalone package function will enlarge and enrich twsagps application range.

Further work on twsagps package should focus on developing and enhancing documentation and error handling system.

References

Chen, Yen-Chi. 2019. “Generalized Cluster Trees and Singular Measures.” The Annals of Statistics 47 (4). https://doi.org/10.1214/18-aos1744.
Chen, Yen-Chi, and Adrian Dobra. 2020. “Measuring Human Activity Spaces from GPS Data with Density Ranking and Summary Curves.” The Annals of Applied Statistics 14 (1). https://doi.org/10.1214/19-aoas1311.
Jankowska, Marta M., Jiue-An Yang, Nana Luo, Chad Spoon, and Tarik Benmarhnia. 2023. “Accounting for Space, Time, and Behavior Using GPS Derived Dynamic Measures of Environmental Exposure.” Health & Place 79 (January): 102706. https://doi.org/10.1016/j.healthplace.2021.102706.
Silverman, B. W. 1986. Density Estimation for Statistics and Data Analysis. Vol. 37. 1. Chapman Hall. http://books.google.com/books?hl=en&lr=&id=e-xsrjsL7WkC&oi=fnd&pg=PR9&dq=Density+Estimation+for+Statistics+and+Data+Analysis&ots=ivSonp7D_q&sig=XfZFlEyzmSO4nm54dgq22EiW9iA.
Yu Zheng, Wei-Ying Ma, Xing Xie. 2010. “GeoLife: A Collaborative Social Networking Service Among User, Location and Trajectory.” IEEE Data Engineering Bulletin 33 (2): 32–44. https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/GeoLife-A20Collaborative20Social20Networking20Service20among20User2C20Location20and20Trajectory.pdf.
Zheng, Yu, Quannan Li, Yukun Chen, Xing Xie, and Wei-Ying Ma. 2008. “Understanding Mobility Based on GPS Data.” Proceedings of the 10th International Conference on Ubiquitous Computing, September. https://doi.org/10.1145/1409635.1409677.
Zheng, Yu, Lizhu Zhang, Xing Xie, and Wei-Ying Ma. 2009. “Mining Interesting Locations and Travel Sequences from GPS Trajectories.” Proceedings of the 18th International Conference on World Wide Web, April. https://doi.org/10.1145/1526709.1526816.